﻿<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="description" content="HANOI, November 10-13, 2013. The IEEE RIVF International Conference on Computing and Communication Technologies, RIVF2013 RIVF 2013" />
<meta name="keywords" content="ieee, rivf, rivf2013, rivf 2013, international conference, computing and communication, technologies, uet, uos, ifi, 10th ieee, engineering, science, vnu, 2013 nov, hanoi, vietnam, infomation management, computational intelligence, communications and networking, modeling and computer simulation, applied operational research and optimization" />
<meta name="author" content="metatags generator">
<meta name="robots" content="index, follow" />
<meta name="revisit-after" content="3 days" />
<title>RIVF-2015</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="styles.css" rel="stylesheet" type="text/css" />
<style type="text/css">
<!--
.style2 {color: #0000CC}
.style4 {
	color: #FFFF00;
	font-weight: bold;
	font-size: 16px;
}
.style5 {font-size: 14px}

pre {
	font: normal 12px Arial, Helvetica, sans-serif;
}
-->
</style>
<style type="text/css">
<!--
p.MsoNormal {
margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";
}
-->
</style>
</head>
<body>
<div class="size">
  <div class="content">
   <div>
     <div class="mnav"> <img class="mnimgl" src="image/img_37.jpg" alt="" /> <img class="mnimgr" src="image/img_50.jpg" alt="" />
       <div class="mnm">
         <ul>
           <li><a href="#">Home</a></li>
           <li><a href="CFP.html">Call For Papers</a></li>
           <li><a href="programme.html">Programme <img src="image/new.gif" alt="" /></a></li>
           <li><a href="submission.html">Submission</a></li>
           <li><a href="registration.html">Registration </a></li>
           <li><a href="contact.html">Contact Us </a></li>
         </ul>
       </div>
     </div>
    
    </div>
    <div class="cmainimg"><img alt="" src="image/rivf2013.jpg" width="100%" /></div>
    <div class="mcontent">
      <div class="mcleft">
        <div class="mcbox01">
          <div class="mc01t2"> MAIN MENU </div>
          <div class="mc01c">
            <ul>
              <li><a href="index.html">Home</a></li>
              <li><a href="CFP.html">Call For Papers</a></li>
			  <li><a href="steeringcommittee.html">Steering Committee</a></li>
              <li><a href="committee.html">Conference Committee</a></li><li><a href="venue.html">Venue </a></li>
              <li><a href="importantdates.html">Important Dates </a></li>
              <li><a href="programme.html">Programme <img src="image/new.gif"></a></li>
              <li><a href="speakers.html">Keynote talks </a></li>
              <li><a href="tutorials.html">Tutorials</a></li><li><a href="vlsp-workshop.html">VLSP Workshop </li><li><a href="cb-workshop.html">CB Workshop </li></a></li><li><a href="campaign.html">VLSP Campaign </a></li>
              <li><a href="submission.html">Submission</a></li><li><a href="finalsubmission.html">Final Submission </a></li><li><a href=" http://uet.vnu.edu.vn/sis/?q=en/rivf2013ecopyright">Copyright</a></li>			  
              <li><a href="acceptedpapers.html">List of Accepted Papers </a></li>
              <li><a href="registration.html">Registration </a></li>
              <li><a href="associate.html">Associate Events</a></li>
              <li><a href="location.html">Location </a></li>
              <li><a href="photos.html">Conference Photos </a></li>
              <li><a href="usefulinfo.html">Useful Information </a></li>
              <li><a href="accommodation.html">Accommodation</a></li>
              <li><a href="tours.html">Tours </a></li>	
              <li><a href="local.html">Local Information</a></li>	
              <li><a href="contact.html">Contact</a></li>
            </ul>
          </div>
          <div class="mc01b"> <img src="image/img_272.jpg" alt="" /> </div>
        </div>
        <div class="mcbox01">
        <div align="center">
          <div class="mc01t2"> SPONSORS </div>
          <div class="mc01cc">
            <div class="mcl02 style2">
              <div align="center">
              <p><img src="image/uet.png" alt="uet" width="90" height="90" /></p>
              <p><strong>UNIVERSITY OF ENGINEERING &amp; TECHNOLOGY </strong></p>
              <p><img src="image/uos.png" width="79" height="100"/></p>
              <p><strong>UNIVERSITY OF SCIENCE</strong></p>
              <p><img src="image/hnue.png" /></p>
              <p><strong>HANOI NATIONAL UNIVERSITY OF EDUCATION</strong></p>
              <p><img src="image/jaist.gif" width="180" /></p>
               <p><img src="image/ifi.png" width="182" height="106"/></p><p><img src="image/fpt.png" width="180"/></p><p><img src="image/onrg.png" width="180"/></p><p><img src="image/afosr.png" width="150"/></p><p><img src="image/aoard.png" width="150"/></p>
              </div> </div>
          </div> </div>
          <div class="mc01b"> <img src="image/img_272.jpg" alt="" /> </div>
        </div>
        <div class="mcbox01">
          <div class="mc01t2"> CONFERENCE COMMITTEE </div>
          <div class="mc01cc">
            <div class="mcl02 style2">Honorary Chairs</div>
            Dinh-Tri Nguyen, VNU-IFI, Vietnam<br />
            Roberto deMarca, IEEE President<br />

            <div class="mcl02 style2">General Chairs</div>
            Tu-Bao Ho, JAIST, Japan <br />
            Piuri Vincenzo, Milan University, Italy<br />
            
			<div class="mcl02 style2">Programme Chairs</div>
            Thanh-Thuy Nguyen, VNU-UET, Vietnam<br />
            Mizuhito Ogawa, JAIST, Japan<br />
            
			<div class="mcl02 style2">Organizing Chairs</div>
            Bao-Son Pham, VNU-UET, Vietnam<br />Anh-Cuong Le, VNU-UET, Vietnam<br/>
	    Cam-Ha Ho, HNUE, Vietnam<br />
            Thi-Minh-Huyen Nguyen, VNU-HUS, Vietnam <br />
            Xuan-Tu Tran, VNU-UET, Vietnam <br />
            Tuong-Vinh Ho, VNU-IFI, Vietnam <br />

			<div class="mcl02 style2">Tutorial Chairs</div>
            Anh-Cuong Le, VNU-UET, Vietnam<br />
            Marc Bui, University Paris 8, France<br />

			<div class="mcl02 style2">Workshop Chairs</div>
            Hi-Duc Pham, ECE, France<br />
            Thi-Ha-Duong Phan, VAST, Vietnam<br />

			<div class="mcl02 style2">Publication Chairs</div>
            Xuan-Tu Tran, VNU-UET, Vietnam<br />
	    Xuan-Hieu Phan, VNU-UET, Vietnam<br />
            Bao-Quoc Ho, VNU-HCMUS, Vietnam<br />

			<div class="mcl02 style2">International Advisory Committee</div>
            Nim Cheung, Hong Kong, China<br />
            John Vig, IEEE, USA<br />
            Janina Mazierska, Massey University, New Zealand<br />
            Dinh-Tri Nguyen, VNU-IFI, Vietnam<br />
            Byeong Ji Lee, Seoul National University, Korea<br />
            Jean-Marc Steyeart, Ecole Polytechnique, France<br />
			Takuya Katayama, JAIST, Japan<br />
			
			<div class="mcl02 style2">Conference Organizers</div>
            IEEE Vietnam Section<br />
            VNU University of Engineering and Technology (VNU-UET)<br />
            VNU Institut Francophonie de l'Informatique (VNU-IFI)<br />
            Hanoi University of Education<br />

			<div class="mcl02 style2">Technical Co-sponsors (incomplete list)</div>
            IEEE Communications Society<br />
            IEEE Computational Intelligence Society<br />
            

            </div>
 <div class="mc01b"> <img src="image/img_272.jpg" alt="" /></div>
        </div>
      </div>
      <div class="mcright">
        <div class="mc01">
          <div class="mc0102"> 
                     <div class="mcl">
              <div class="mcl01"> 
                <div align="center">The first Evaluation Campaign on Vietnamese Language Processing</div><br>
              </div>
            </div>
          </div>
                   <p>&nbsp;</p>
                   <p>&nbsp;</p>
                   <marquee behavior="alternate"></marquee>
</div>
         
          <div class="mc0203">
            <div class="mc020301">
              <table cellspacing="0" cellpadding="0">
                <tr>
                  <td>
                    <p><strong>Home</strong><br /><br />
                     This first VLSP evaluation campaign deals with two different tasks. One concerns the very essential tools for Vietnamese language processing, i.e. word segmentation and POS tagging. The other concerns one of the most important NLP applications, Machine Translation (MT). 

					  <p><strong>Organizers</strong><br /><br />
    					Institute of Information Technology, VAST, Vietnam<br>
              Vietnam National University, Hanoi, Vietnam 
            </p>


					  <p><strong>Sponsors</strong><br><br>
              <i>This evaluation campaign would not be possible without the contribution of the following institutions with their human resources for data preparation and evaluation activities:</i></p>
              <p>
              - Institute of Information Technology, VAST, Vietnam<br>
              - Vietnam National University, Hanoi, Vietnam <br>
              - Vietnam Lexicography Center<br>
              </p><p>
              We welcome all other sponsors from other institutions and companies interested in Vietnamese language processing area. 
            </p>

            

            <p><strong>Publications</strong></p>
            <p>The participants to the evaluation campaign will be asked to present their system in a dedicated paper. All well-formed system papers will be accepted after review and presented as an oral talk or poster in the scientific paper session.</p>
            <p>After the evaluation campaign, we expect the collaboration of all the participants to submit a scientific paper on the systems and the evaluation results to an international conference or a journal specialized in language resources and evaluation.     
            </p>

            <p><strong>Important  dates:</strong></p>
            <table style="text-align: left; width: 100%; color: #10359D;">
             <tr>
                <td>August 3, 2013</td>
                <td>Registration to the campaign</td>
              </tr>
     <tr>
                <td>August 5, 2013</td>
                <td>Distribution of training data for translation task</td>
              </tr>
    <tr>
                <td><del>August 16, 2013</del><br><b>extend to August 31, 2013</b></td>
                <td>Distribution of training data for word segmentation and POS tagging task</td>
              </tr>
    <tr>
                <td>September 15, 2013</td>
                <td>Registration to the campaign closed</td>
              </tr>
              <tr>
                <td>October 9, 2013 </td><td>Test data release</td>
              </tr>
              <tr>
                <td>October 13, 2013</td><td>System result submission</td>
              </tr>
              
              <tr>
                <td>October 19, 2013</td><td>Scientific and system paper submission deadline</td>
              </tr>
              <tr>
                <td>October 27, 2013  </td><td>Acceptance notification</td>
              </tr>
              <tr>
                <td>October 28, 2013</td><td>Author registration to the workshop</td>
              </tr>
              <tr>
                <td>November 3, 2013</td><td>Camera-ready paper submission</td>
              </tr>
              <tr>
                <td>November 10, 2013 </td><td>Workshop date</td>
              </tr>
            </table>

            <p><strong>Contact</strong></p>
            <p><i>If you plan to participate to the evaluation campaign or have any questions, please contact to:</i></p>
            <p><b>for WordSeg & POSTag Task:</b>
              <br>
              Dr. Nguyen Phuong Thai<br>
              Email: <a href="mailto:thainp@vnu.edu.vn" target="_blank">thainp@vnu.edu.vn</a>
            </p>
            <p><b>for Translation Task</b>
              <br>
              Dr. Le Hai Son<br>
              Email: <a href="mailto:lehaison@ioit.ac.vn" target="_blank">lehaison@ioit.ac.vn</a> 
              <br>
              Dr. Ha Thanh Le<br>
		   Email: <a href="mailto:htle@ioit.ac.vn" target="_blank">htle@ioit.ac.vn</a>
	    </p>

            <p><strong>Mailing list</strong></p>
            <p>If you plan to participate to the evaluation campaign, please register to this mailing list (<a target="_blank" href="mailto:vlsp-eval@googlegroups.com">vlsp-eval@googlegroups.com</a>), in order to receive up-to-date information about developments in the campaign or ask questions related to the data and evaluation process.</p>
            <hr>

            <p><strong>WordSeg & POSTag Task</strong></p>
              <p><b>Introduction</b></p>
                <p>Word segmentation and POS tagging are basic and difficult tasks in NLP, especially for isolating languages like Vietnamese in which compound words belong to the core of the language and the parts-of-speech are not well defined in the linguistic literature. A national project on Vietnamese Language and Speech Processing successfully completed in 2009 has brought to the researchers fundamental NLP tools and resources, thus initiating the appropriate setting to go further in researching, developing and deploying useful software applications in the field.</p>
                <p>This campaign aims at automatically evaluating Vietnamese word segmentation and POS tagging systems, in order to encourage scientists to use and evaluate resources and tools from the VLSP project and permit to promote the most efficient methods for these basic tasks for Vietnamese processing.</p>
              <p><b>Task Description</b></p>
                <p>This evaluation includes two subtasks: word segmentation and POS tagging.<br><i>Participants to the evaluation can use either:</i></p>
                <ul style="text-align: left; color: #10359D;">
                  <li>exclusively the training data provided by the campaign</li>
                  <li>or all kind of language resources</li>
                </ul>
                <p><i>and must specify which of those two categories they wish to compete in.</i></p>

              <p><b>Evaluation Metrics</b></p>
              <ul style="text-align: left; color: #10359D;">
                  <li>Word segmentation:<br>
                    <ul style="text-align: left; color: #10359D;">
                      <li>P(recision): (number of words correctly segmented)/(number of words in the system output)</li>
                      <li>R(ecall): (number of words correctly segmented)/(number of words in the reference corpus)</li>
                      <li>F1 measure = 2*P*R/(P+R)</li>
                    </ul>
                  </li>
                  <li>POS tagging:<br>
                    <ul style="text-align: left; color: #10359D;">
                      <li>P(recision): (number of words correctly tagged)/(number of words in the system output)</li>
                      <li>R(ecall): (number of words correctly tagged)/(number of words in the reference corpus)</li>
                      <li>F1 measure = 2*P*R/(P+R)</li>
                    </ul>
                  </li>
                  <li>or all kind of language resources</li>
                </ul>
              <p><b>Training and Test Data</b></p>
                <p>3 types of training data are available:</p>
                <ul style="text-align: left; color: #10359D;">
                  <li>Segmented Corpus: This corpus contains about (will be updated soon) sentences extracted from Vietnamese online news that are segmented into words.</li>
                  <li>POS Tagged Corpus: This corpus contains about 30,000 sentences extracted from Vietnamese online news that are segmented into words, each of which is tagged with its part-of-speech.</li>
                  <li>Raw Corpus: This corpus contains about (will be updated soon) unannotated sentences extracted from Vietnamese online news.</li>
                </ul>
                <p>The test corpora include two types of data. One contains sentences from Vietnamese news and another contains sentences from other categories of Vietnamese texts.</p>
              <p><b>Data Format</b></p>
              <ul style="text-align: left; color: #10359D;">
                <li>All data will be encoded as UTF-8 plain text.</li>
                <li>The input and output segmented corpora are formatted one word per line. </li>
                <li>The input and output POS tagged corpora are formatted one word per line; each word is separated from the POS tag by a tab space.</li>
                <li>The raw corpus is provided as is.</li>
              </ul>
              <p><b>Data Copyright – Acknowledgments</b></p>
              <p>The annotated corpora provided by the campaign are collected from two sources:</p>
              <p>- VLSP project (<a target="_blank" href="http://vlsp.vietlp.org:8080/demo/">http://vlsp.vietlp.org:8080/demo/</a>)<br>
              - Vietnam Lexicography Center (<a target="_blank" href="http://vietlex.com/">http://vietlex.com/</a>)
              </p>
              <p>Participants should use these data for research purposes only and acknowledge the authorship of these data in their publications. </p>

            <hr>

            <p><strong>Translation Task</strong></p>
              <p><b>Introduction</b></p>
              <p>Machine translation (MT) is one of the traditional and difficult tasks in the field of NLP. Recently, due to the rapidly increasing amount of data and computing power in ubiquitous environments, the development of Vietnamese-related MT systems has attracted not only academic institutes but also R&D units from companies, both from inland Vietnam and overseas, even at the global scope.</p>
              <p>This campaign aims to create an authentic environment to automatically and manually evaluate translation system so that it helps to boost MT research in the Vietnamese Language Technology community. We also welcome research groups aiming to improve the methods and bring their systems into real-world scenarios. Further domestic and international collaborations on the field are included in our goals. </p>
              <p><b>Task Description</b></p>
                <p>The task is to produce fully automatically, using computers and software, text translation of TED talks for English and Vietnamese. The participants may translate in only one direction, either from English to Vietnamese or from Vietnamese to English, but are strongly encouraged to attempt translation in both directions. The software to be used should be self-developed or open-sourced, illustrating the participants’ research, otherwise those systems would be classified as unconstrained systems.  </p>
                <p>Unconstrained systems will not be human-evaluated and ranked. Unconstrained systems include those that use commercial translation products developed by people other than the participants (e.g. Systran products), the systems which have free software developed by others playing a big part in the translation process (e.g. Google Translate or Bing) and the systems which use data not provided by the workshop. You can, however, use other data and systems to demonstrate the significant improvements by means of large data and report them in your system paper.</p>
                <p>To have fair evaluations, participants may be asked individually for providing more information about their system, e.g. language model or translation model produced from the data (in case that they follow statistical machine translation).</p>
              <p><b>Evaluation Metrics</b></p>
                <p>BLEU score from the script <a target="_blank" href="ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13.pl">ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13.pl</a><br>
                Human evaluation.</p>

              <p><b>Training and Test Data</b></p>
                <ul style="text-align: left; color: #10359D;">
                  <li>
                    <b>Parallel Training Corpus:</b><br>
                    Download link: <a href="vlspdata/mtdata.zip" target="_blank">download_here</a> 
                    <br>
                    TED talks' English-Vietnamese subtitle corpus.<br>
                    Download:<br>
                    <ul style="text-align: left; color: #10359D;">
                      <li>Training set: 212,454 sentence pairs from 875 talks  TBA.</li>
                      <li> Development set: 2003 sentence pairs from 6 talks  TBA.</li>
                    </ul>
                  </li>
                  <li>
                    <b>Test Corpus:</b><br>
                    The test data for English to Vietnamese translation: <a href="vlspdata/ted.test2013.en2vi.en">download here</a> <br>
                    The test data for Vietnamese to English translation: <a href="vlspdata/ted.test2013.vi2en.vi">download here</a> <br>
                  </li>
                  <li>
                    <b>Monolingual Corpora:</b><br>
                    <ul style="text-align: left; color: #10359D;">
                      <li><b>English:</b> The English data are from the WMT 2013 webpage
                        <ul style="text-align: left; color: #10359D;">
                          <li>News commentary:<br> <a target="_blank" href="http://www.statmt.org/wmt13/training-monolingual-nc-v8.tgz">http://www.statmt.org/wmt13/training-monolingual-nc-v8.tgz</a></li>
                          <li>NewsCrawl 2012: <br><a target="_blank" href="http://www.statmt.org/wmt13/training-monolingual-news-2012.tgz">http://www.statmt.org/wmt13/training-monolingual-news-2012.tgz</a></li>
                          <li>English part of TED parallel corpus.</li>
                        </ul>
                      </li>
                      <li><b>Vietnamese</b> </br>
                        <ul style="text-align: left; color: #10359D;">
                          <li>Online news crawled from Internet</li>
                          <li>Vietnamese part of TED parallel corpus.</li>
                        </ul>
                      </li>
                    </ul>
                  </li>
                </ul>
              <p><b>Data Format</b></p>
<ul style="text-align: left; color: #10359D;">

<li>Input format:</li>
  <ul style="text-align: left; color: #10359D;">
    <li>For the TED parallel data, training, development and test sets will be provided as UTF-8 plain text, 1-to-1 sentence aligned, one “sentence” per line. Notice that “sentence” here does not necessarily designate a linguistic sentence, but may be phrases. </li>
    <li>For the monolingual corpora, we provide UTF-8 plain text, one “sentence” per line as you may see when you download them.</li>
  </ul>
</li>
<li>Output format:  UTF-8, precomposed unicode plain text, one sentence per line.  Participants might choose appropriate casing methods in the preprocessing steps: true casing, lowercasing or leaving it alone; for evaluation, the end outputs will be lowercased by organizers and compared to lowercased references.</li>
<li>You might want to use some scripts from Moses to do casing and normalizing texts before training:
  <ul style="text-align: left; color: #10359D;">
    <li>Tokenizer tokenizer.perl </li>
    <li>Detokenizer detokenizer.perl </li>
    <li>Lowercaser lowercase.perl </li>
  </ul>
</li>
</ul>
<p>These tools are available in the Moses git repository: <br><a target="_blank" href="https://github.com/moses-smt/mosesdecoder">https://github.com/moses-smt/mosesdecoder</a></p>

              <p><b>Data Copyright – Acknowledgments</b></p>
              <p>TED makes its collection of video recordings and subtitles of talks available under the Creative Commons BY-NC-ND license (look <a target="_blank" href="http://www.ted.com/pages/talk_usage_policy">http://www.ted.com/pages/talk_usage_policy</a>). We acknowledge the authorship of TED talks (BY condition) and do not redistribute subtitles for commercial purposes (NC) – using the data for research purposes only. As regards the integrity of the work (ND), we only change the format of the container, while preserving the original contents. The participants must conform to the TED Talks usage policy. We are not responsible for any kind of violation from the participants.</p>
        </td>
                </tr>
                        </tbody>
              </table></td>
                </tr>
              </table>
              <p>&nbsp;</p>
            </div>
            <div class="mc020303"></div>
          </div>
      </div>
    </div>
  </div>
</div>
 <div class="footer"> <img class="mnimgl" src="image/img_275.jpg" alt="" /> <img class="mnimgr" src="image/img_291.jpg" alt="" /> <a href="index.html">HOME</a>&nbsp;&nbsp;|&nbsp;<a href="CFP.html">CALL FOR PAPER</a>&nbsp;&nbsp;|&nbsp;&nbsp;<a href="programme.html">Programme <img src="image/new.gif"></a>&nbsp;&nbsp;|&nbsp;<a href="submission.html">SUBMISSION</a>&nbsp;&nbsp;|&nbsp;&nbsp;<a href="registration.html">REGISTRATION</a>&nbsp;&nbsp;|&nbsp;&nbsp;<a href="contact.html">CONTACT US</a> <br/>
 <div class="footer"> <img class="mnimgl" src="image/img_275.jpg" alt="" />Contact Emails: <a href="mailto:nguyenthanhthuy@vnu.edu.vn">nguyenthanhthuy@vnu.edu.vn</a>  
  </div>
</div>
</body>
</html>